2026-06-30

Flow Matching

Flow Matching is a simulation-free approach to train continuous normalizing flows by directly regressing on a vector field that generates a desired probability path. Unlike [[Diffusion Model|diffusion models]] that rely on [[Stochastic Differential Equation (SDE)|SDE]] theory, Flow Matching provides a simpler and more flexible framework for generative modeling using ordinary differential equations.

1. Core Concept

1.1 Motivation

Problems with existing methods:

[[Diffusion Model|Diffusion Models]]:
- Require complex [[Stochastic Differential Equation (SDE)|SDE]]/ODE theory
- Need to solve Fokker-Planck equation
- Constrained by specific noise schedules
[[Continuous Normalizing Flow]]:
- Likelihood computation is expensive
- Training requires simulating ODE trajectories
- Limited architectural choices
Score-Based Models:
- Require score matching objectives
- Complex mathematical derivation

Flow Matching solves these by:

Direct vector field regression (no simulation needed)
Flexible conditional flow design
Simpler mathematical foundation
Connections to optimal transport

1.2 Key Idea

Instead of deriving the ODE from a stochastic process, Flow Matching directly learns a velocity field that transports samples from a simple distribution (e.g., Gaussian) to the data distribution.

\frac{d x}{d t} = v_{θ} (x, t)

where $v_{θ} (x, t)$ is a neural network parameterized velocity field.

[!NOTE] Core Insight
Flow Matching bypasses the need for [[Stochastic Differential Equation (SDE)|SDE]] theory and score matching by directly learning the velocity field that generates the desired probability flow, making it conceptually simpler and more flexible.

2. Mathematical Foundation

2.1 Continuous Normalizing Flows

A continuous normalizing flow (CNF) is defined by an ODE:

\frac{d x}{d t} = v (x, t), x (0) \sim p_{0}, x (1) \sim p_{1}

where:

$p_{0}$ : Simple prior distribution (e.g., $N (0, I)$ )
$p_{1}$ : Target data distribution
$v (x, t)$ : Time-dependent velocity field

Probability path $p_{t} (x)$ evolves according to the continuity equation:

\frac{\partial p_{t} (x)}{\partial t} = - \nabla_{x} \cdot [v (x, t) p_{t} (x)]

2.2 Flow Matching Objective

Goal: Learn $v_{θ} (x, t)$ such that the generated probability path $p_{t} (x)$ matches the desired path.

Flow Matching Loss:

L_{FM} (θ) = E_{t \sim U [0, 1], x \sim p_{t} (x)} [∥ v_{θ} (x, t) - u_{t} (x) ∥^{2}]

where $u_{t} (x)$ is the target vector field that generates the desired probability path.

2.3 The Challenge

The problem: $p_{t} (x)$ and $u_{t} (x)$ are intractable - we cannot sample from $p_{t} (x)$ or evaluate $u_{t} (x)$ directly.

Solution: Use Conditional Flow Matching (CFM).

3. Conditional Flow Matching (CFM)

3.1 Key Insight

Instead of matching the marginal vector field $u_{t} (x)$ , we match conditional vector fields $u_{t} (x ∣ z)$ given some conditioning variable $z$ .

Conditional Flow Matching Loss:

L_{CFM} (θ) = E_{t, x \sim p_{t} (x ∣ z), z \sim p (z)} [∥ v_{θ} (x, t) - u_{t} (x ∣ z) ∥^{2}]

Theorem: Under certain conditions, $\nabla_{θ} L_{FM} = \nabla_{θ} L_{CFM}$ , meaning minimizing CFM also minimizes FM.

3.2 Conditional Probability Path

Given data point $z = x_{1} \sim p_{1}$ , we define a conditional probability path:

p_{t} (x ∣ x_{1}) = N (x ∣ μ_{t} (x_{1}), σ_{t}^{2} (x_{1}) I)

where:

$μ_{t} (x_{1})$ : Time-dependent mean
$σ_{t} (x_{1})$ : Time-dependent standard deviation

Boundary conditions:

$t = 0$ : $p_{0} (x ∣ x_{1}) = N (x ∣ 0, I)$ (prior)
$t = 1$ : $p_{1} (x ∣ x_{1}) = δ (x - x_{1})$ (data point)

3.3 Conditional Vector Field

For the Gaussian conditional path, the conditional vector field is:

u_{t} (x ∣ x_{1}) = \frac{d μ_{t} / d t}{σ_{t}} (x - μ_{t}) + \frac{d σ_{t}}{d t} \frac{x - μ_{t}}{σ_{t}}

Simplified form:

u_{t} (x ∣ x_{1}) = \frac{σ_{t}^{'}}{σ_{t}} (x - μ_{t}) + μ_{t}^{'}

where $μ_{t}^{'} = \frac{d μ_{t}}{d t}$ and $σ_{t}^{'} = \frac{d σ_{t}}{d t}$ .

4. Common Flow Designs

4.1 Optimal Transport Flow

Idea: Transport points along straight lines from noise to data.

Conditional path:

x_{t} = (1 - t) x_{0} + t x_{1}, x_{0} \sim N (0, I), x_{1} \sim p_{1}

Conditional vector field:

u_{t} (x ∣ x_{1}) = x_{1} - x_{0} = \frac{x_{1} - x_{t}}{1 - t}

Advantages:

Straight trajectories (easy to integrate)
Minimal transport cost (optimal transport)
Fast sampling (few ODE steps needed)

4.2 Gaussian Conditional Flow

General form:

μ_{t} (x_{1}) = t x_{1}, σ_{t} (x_{1}) = 1 - (1 - σ_{min}) t

Conditional vector field:

u_{t} (x ∣ x_{1}) = \frac{x_{1} - (1 - σ_{min}) x}{1 - (1 - σ_{min}) t}

where $σ_{min}$ is a small constant (e.g., $0.001$ ) for numerical stability.

4.3 Variance Exploding Flow

Similar to VE-[[Stochastic Differential Equation (SDE)|SDE]] in diffusion models:

σ_{t} = σ_{min} {(\frac{σ_{max}}{σ_{min}})}^{t}

Conditional vector field:

u_{t} (x ∣ x_{1}) = \frac{σ_{t}^{'}}{σ_{t}} (x - t x_{1}) + x_{1}

4.4 Comparison of Flow Designs

Flow Type	Trajectory	Transport Cost	Sampling Speed	Stability
OT Flow	Straight	Minimal	Very Fast	Good
Gaussian	Curved	Moderate	Fast	Very Good
VE Flow	Curved	Higher	Medium	Good
VP Flow	Curved	Moderate	Fast	Very Good

5. Training Algorithm

5.1 Flow Matching Training

# Conditional Flow Matching Training
def flow_matching_loss(model, x1, t):
    """
    model: Neural network v_theta(x, t)
    x1: Data samples from p_1
    t: Time steps sampled from U[0, 1]
    """
    # Sample noise from prior
    x0 = torch.randn_like(x1)  # x0 ~ N(0, I)
    
    # Construct conditional path
    mu_t = t * x1
    sigma_t = 1 - (1 - sigma_min) * t
    
    # Sample x_t from conditional distribution
    eps = torch.randn_like(x1)
    xt = mu_t + sigma_t * eps
    
    # Compute target vector field (OT flow)
    ut = (x1 - x0)  # or: ut = (x1 - xt) / (1 - t)
    
    # Predict velocity field
    v_theta = model(xt, t)
    
    # MSE loss
    loss = F.mse_loss(v_theta, ut)
    
    return loss

5.2 Complete Training Loop

for epoch in range(num_epochs):
    for x1 in dataloader:
        # Sample time
        t = torch.rand(x1.shape[0], device=x1.device)
        
        # Compute loss
        loss = flow_matching_loss(model, x1, t)
        
        # Update
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

5.3 Key Differences from Diffusion Models

Aspect	Diffusion Models	Flow Matching
Objective	Score matching / ELBO	Vector field regression
Mathematical foundation	[[Stochastic Differential Equation (SDE)\|SDE]] theory	ODE theory
Target	Score function $\nabla \log p_{t} (x)$	Velocity field $v (x, t)$
Training	Denoising objective	Direct regression
Flexibility	Constrained by [[Stochastic Differential Equation (SDE)\|SDE]]	Arbitrary flow design
Likelihood	Tractable via ODE	Tractable via ODE

6. Sampling Algorithm

6.1 ODE Integration

# Flow Matching Sampling
def sample(model, num_steps=50):
    """
    model: Trained velocity field network
    num_steps: Number of ODE solver steps
    """
    # Sample from prior
    x_0 = torch.randn(batch_size, dim)
    
    # Define ODE
    def ode_func(t, x):
        return model(x, t)
    
    # Solve ODE from t=0 to t=1
    t_span = [0, 1]
    t_eval = torch.linspace(0, 1, num_steps)
    
    solution = solve_ivp(ode_func, t_span, x_0, t_eval=t_eval, method='RK45')
    
    # Return final state
    x_1 = solution.y[:, -1]
    
    return x_1

6.2 Euler Method (Simple)

def euler_sample(model, x_0, num_steps=100):
    x_t = x_0
    dt = 1.0 / num_steps
    
    for i in range(num_steps):
        t = i * dt
        v = model(x_t, t)
        x_t = x_t + v * dt
    
    return x_t

6.3 Advanced ODE Solvers

Solver	Order	Steps Needed	Characteristics
Euler	1st	100-200	Simple, slow
RK4	4th	50-100	Accurate, moderate
DOPRI5	Adaptive	20-50	Automatic step size
[[DPM-Solver]]	Specialized	10-20	Fast for diffusion

7. Theoretical Analysis

7.1 Equivalence to Score Matching

Theorem: Under certain conditions, Flow Matching is equivalent to score matching.

For the conditional flow:

u_{t} (x ∣ x_{1}) = σ_{t}^{'} σ_{t} \nabla_{x} \log p_{t} (x ∣ x_{1}) + μ_{t}^{'}

This shows the connection between velocity fields and score functions.

7.2 Likelihood Computation

Using the instantaneous change of variables formula:

\frac{d}{d t} \log p_{t} (x (t)) = - \nabla_{x} \cdot v (x (t), t)

Integrating from $t = 0$ to $t = 1$ :

\log p_{1} (x_{1}) = \log p_{0} (x_{0}) - \int_{0}^{1} \nabla_{x} \cdot v (x (t), t) d t

Divergence computation:

Exact: $O (d^{2})$ for $d$ -dimensional data
Hutchinson’s estimator: $O (d)$ (stochastic)

7.3 Optimal Transport Connection

Benamou-Brenier Formula:

The Wasserstein-2 distance between $p_{0}$ and $p_{1}$ can be expressed as:

W_{2}^{2} (p_{0}, p_{1}) = inf_{v} \int_{0}^{1} \int ∥ v (x, t) ∥^{2} p_{t} (x) d x d t

subject to the continuity equation.

OT Flow minimizes this transport cost, leading to straight trajectories.

7.4 Rectified Flows

Key Idea: Iteratively straighten the flow trajectories.

Algorithm:

Train initial Flow Matching model
Generate samples $(x_{0}, x_{1})$ pairs
Retrain model with straight-line interpolation: $x_{t} = (1 - t) x_{0} + t x_{1}$
Repeat 2-3 times

Result: Nearly straight trajectories, enabling 1-step generation.

8. Advanced Variants

8.1 Rectified Flow

Motivation: Straight trajectories are easier to integrate.

Method:

Learn residual velocity: $v_{θ} (x, t) = x_{1} - x_{0} + residual$
Iteratively “rectify” the flow
Achieve 1-2 step generation with high quality

Loss:

L = E [∥ v_{θ} (x_{t}, t) - (x_{1} - x_{0}) ∥^{2}]

8.2 Flow Matching with Prior Blending

Idea: Use learned prior instead of fixed Gaussian.

p_{0} (x) = VAE latent distribution

Benefits:

Lower transport cost
Faster convergence
Better sample quality

8.3 Multimodal Flow Matching

Challenge: Standard flows are deterministic mappings (bijective).

Solution: Use mixture of flows or stochastic interpolation.

x_{t} \sim \sum_{k} w_{k} N (x ∣ μ_{t}^{(k)}, σ_{t}^{(k)} I)

8.4 Comparison Table

Variant	Trajectory	Steps	Quality	Training Cost
Standard FM	Curved	50-100	High	Low
Rectified Flow	Straight	1-10	Very High	Medium
Prior Blending	Curved	30-50	Very High	Medium
Multimodal FM	Curved	50-100	High	High

9. Applications

9.1 Text-to-Image Generation

Stable Diffusion + Flow Matching:

Replace diffusion ODE with Flow Matching
Faster training (no score matching)
Flexible flow design
Comparable or better FID scores

Example: SD3 (Stable Diffusion 3) uses rectified flows.

9.2 Molecular Generation

Advantages:

Continuous representation of molecules
Exact likelihood computation
Flexible prior design
Fast sampling

9.3 Audio Synthesis

Benefits:

High-fidelity audio generation
Faster than diffusion models
Controllable generation via conditioning

9.4 Video Generation

Temporal Flow Matching:

Model spatiotemporal dynamics
Straight trajectories reduce artifacts
Efficient sampling for long sequences

9.5 3D Generation

Point Cloud / Mesh Generation:

Continuous 3D structure modeling
Optimal transport preserves geometry
Fast generation for interactive applications

10. Practical Implementation

10.1 Network Architecture

Common choices:

U-Net (from diffusion models):
- Proven architecture
- Multi-scale processing
- Attention mechanisms
Transformer:
- Global receptive field
- Scalable to high dimensions
- Good for sequential data
MLP (for low-dimensional data):
- Simple and efficient
- Good for toy examples

Time embedding:

class SinusoidalTimeEmbedding(nn.Module):
    def __init__(self, dim):
        super().__init__()
        self.dim = dim
    
    def forward(self, t):
        # Sinusoidal embedding
        device = t.device
        half_dim = self.dim // 2
        embeddings = math.log(10000) / (half_dim - 1)
        embeddings = torch.exp(torch.arange(half_dim, device=device) * -embeddings)
        embeddings = t[:, None] * embeddings[None, :]
        embeddings = torch.cat((embeddings.sin(), embeddings.cos()), dim=-1)
        return embeddings

10.2 Training Best Practices

1. Time sampling:

Uniform: $t \sim U [0, 1]$
Importance sampling: More weight on difficult regions

2. Data normalization:

Normalize data to $[- 1, 1]$ or $N (0, 1)$
Ensure numerical stability

3. Learning rate scheduling:

Warmup: Gradually increase LR
Cosine decay: Smooth decrease

4. Batch size:

Larger batches = more stable gradients
Typical: 64-256

10.3 Debugging Checklist

[ ] Verify boundary conditions: $p_{0} = N (0, I)$ , $p_{1} \approx$ data
[ ] Check trajectory continuity (no jumps)
[ ] Monitor loss convergence
[ ] Test ODE solver with different step sizes
[ ] Validate likelihood computation
[ ] Compare sample quality with baseline

11. Comparison with Other Methods

11.1 Flow Matching vs [[Diffusion Model|Diffusion Models]]

Aspect	Flow Matching	Diffusion Models
Foundation	ODE theory	[[Stochastic Differential Equation (SDE)\|SDE]] theory
Objective	Vector field regression	Score matching / ELBO
Flexibility	High (arbitrary flows)	Constrained (noise schedule)
Training	Simple regression	Complex derivation
Sampling	ODE integration	[[Stochastic Differential Equation (SDE)\|SDE]]/ODE integration
Likelihood	Exact	Exact (via ODE)
Theory	Simpler	More complex

11.2 Flow Matching vs [[Continuous Normalizing Flow]]

Aspect	Flow Matching	Traditional CNF
Training	Simulation-free	Requires ODE simulation
Speed	Fast	Slow (backprop through ODE)
Architecture	Flexible	Constrained (trace computation)
Scalability	High	Limited

11.3 Flow Matching vs GAN

Aspect	Flow Matching	GAN
Training stability	Stable (MSE loss)	Unstable (minimax game)
Mode coverage	Complete	Mode collapse possible
Likelihood	Exact	Intractable
Sample quality	High	High
Sampling speed	Medium (ODE steps)	Fast (1 step)

11.4 Generative Model Comparison

Model	Training	Sampling	Likelihood	Stability	Quality
GAN	Adversarial	1 step	Intractable	Unstable	High
VAE	ELBO	1 step	Lower bound	Stable	Medium
Normalizing Flow	Likelihood	Parallel	Exact	Stable	Medium-High
Diffusion	Score matching	50-1000 steps	Exact	Stable	Very High
Flow Matching	Vector regression	10-100 steps	Exact	Stable	Very High

12. Core Formula Cards

[!QUOTE] Flow Matching Objective
$L_{FM} (θ) = E_{t, x \sim p_{t} (x)} [∥ v_{θ} (x, t) - u_{t} (x) ∥^{2}]$

[!QUOTE] Conditional Flow Matching
$L_{CFM} (θ) = E_{t, x \sim p_{t} (x ∣ z), z} [∥ v_{θ} (x, t) - u_{t} (x ∣ z) ∥^{2}]$

[!QUOTE] Optimal Transport Flow
$x_{t} = (1 - t) x_{0} + t x_{1}, u_{t} (x ∣ x_{1}) = x_{1} - x_{0}$

[!QUOTE] Gaussian Conditional Flow
$μ_{t} (x_{1}) = t x_{1}, σ_{t} = 1 - (1 - σ_{min}) t$ $u_{t} (x ∣ x_{1}) = \frac{x_{1} - (1 - σ_{min}) x}{1 - (1 - σ_{min}) t}$

[!QUOTE] Continuity Equation
$\frac{\partial p_{t} (x)}{\partial t} = - \nabla_{x} \cdot [v (x, t) p_{t} (x)]$

[!QUOTE] Likelihood Computation
$\log p_{1} (x_{1}) = \log p_{0} (x_{0}) - \int_{0}^{1} \nabla_{x} \cdot v (x (t), t) d t$

13. Recent Advances (2023-2024)

13.1 Rectified Flow

Key Paper: Building Normalizing Flows with Stochastic Interpolants (Albergo et al., 2023)

Contributions:

Iterative straightening of trajectories
1-2 step generation with high quality
Connections to optimal transport

13.2 Flow Matching for Large-Scale Generation

SD3 (Stable Diffusion 3):

Uses rectified flows instead of diffusion
Better sample quality
Faster training and sampling
Multimodal conditioning

13.3 Flow Matching + Consistency Models

Idea: Combine Flow Matching with consistency models for 1-step generation.

Method:

Train Flow Matching model
Distill to consistency model
Achieve 1-step generation

13.4 Multimodal Flow Matching

Challenge: Standard flows are deterministic.

Solutions:

Mixture of flows
Stochastic interpolation
Latent variable models

[[Diffusion Model]]
[[Continuous Normalizing Flow]]
[[Probability Flow ODE]]
[[Stochastic Differential Equation (SDE)]]
[[Fokker-Planck Equation]]
[[Optimal Transport]]
[[Rectified Flows]]
[[Score Function]]
[[Neural ODE]]
[[DPM-Solver]]
[[DDIM]]
[[Wiener Process|Wiener Process]]
[[Markov Process]]
[[U-Net]]
[[Generative Adversarial Network (GAN)]]

Dataview Query

1
2
3

LIST
FROM #flow_matching OR #continuous_normalizing_flow OR #generative_model
SORT file.ctime DESC

References

Paper: Flow Matching for Generative Modeling (Lipman et al., 2023)
Paper: Building Normalizing Flows with Stochastic Interpolants (Albergo et al., 2023)
Paper: Rectified Flow (Liu et al., 2022)
Paper: Action Matching: Learning Stochastic Dynamics From Samples (Neklyudov et al., 2022)
Paper: SE(3)-Stochastic Flow Matching for Protein Backbone Generation (Bose et al., 2023)
Blog: Flow Matching: A New Paradigm for Generative Modeling - Lilian Weng
Course: CS236 Deep Generative Models (Stanford)
GitHub: https://github.com/atong01/conditional-flow-matching

Flow Matching

1. Core Concept

1.1 Motivation

1.2 Key Idea

2. Mathematical Foundation

2.1 Continuous Normalizing Flows

2.2 Flow Matching Objective

2.3 The Challenge

3. Conditional Flow Matching (CFM)

3.1 Key Insight

3.2 Conditional Probability Path

3.3 Conditional Vector Field

4. Common Flow Designs

4.1 Optimal Transport Flow

4.2 Gaussian Conditional Flow

4.3 Variance Exploding Flow

4.4 Comparison of Flow Designs

5. Training Algorithm

5.1 Flow Matching Training

5.2 Complete Training Loop

5.3 Key Differences from Diffusion Models

6. Sampling Algorithm

6.1 ODE Integration

6.2 Euler Method (Simple)

6.3 Advanced ODE Solvers

7. Theoretical Analysis

7.1 Equivalence to Score Matching

7.2 Likelihood Computation

7.3 Optimal Transport Connection

7.4 Rectified Flows

8. Advanced Variants

8.1 Rectified Flow

8.2 Flow Matching with Prior Blending

8.3 Multimodal Flow Matching

8.4 Comparison Table

9. Applications

9.1 Text-to-Image Generation

9.2 Molecular Generation

9.3 Audio Synthesis

9.4 Video Generation

9.5 3D Generation

10. Practical Implementation

10.1 Network Architecture

10.2 Training Best Practices

10.3 Debugging Checklist

11. Comparison with Other Methods

11.1 Flow Matching vs [[Diffusion Model|Diffusion Models]]

11.2 Flow Matching vs [[Continuous Normalizing Flow]]

11.3 Flow Matching vs GAN

11.4 Generative Model Comparison

12. Core Formula Cards

13. Recent Advances (2023-2024)

13.1 Rectified Flow

13.2 Flow Matching for Large-Scale Generation

13.3 Flow Matching + Consistency Models

13.4 Multimodal Flow Matching

Related Concepts

Dataview Query

References